Search CORE

4 research outputs found

Indian Legal NLP Benchmarks : A Survey

Author: D Vivek Raghavan Ph.
D. Janani Venugopalan Ph.
Kalamkar Prathamesh
Publication venue
Publication date: 13/07/2021
Field of study

Availability of challenging benchmarks is the key to advancement of AI in a specific field.Since Legal Text is significantly different than normal English text, there is a need to create separate Natural Language Processing benchmarks for Indian Legal Text which are challenging and focus on tasks specific to Legal Systems. This will spur innovation in applications of Natural language Processing for Indian Legal Text and will benefit AI community and Legal fraternity. We review the existing work in this area and propose ideas to create new benchmarks for Indian Legal Natural Language Processing

arXiv.org e-Print Archive

Named Entity Recognition in Indian court judgments

Author: Agarwal Astha
Gupta Smita
Kalamkar Prathamesh
Karn Saurabh
Raghavan Vivek
Tiwari Aman
Publication venue
Publication date: 07/11/2022
Field of study

Identification of named entities from legal texts is an essential building block for developing other legal Artificial Intelligence applications. Named Entities in legal texts are slightly different and more fine-grained than commonly used named entities like Person, Organization, Location etc. In this paper, we introduce a new corpus of 46545 annotated legal named entities mapped to 14 legal entity types. The Baseline model for extracting legal named entities from judgment text is also developed.Comment: to be published in NLLP 2022 Workshop at EMNL

arXiv.org e-Print Archive

Corpus for Automatic Structuring of Legal Documents

Author: Agarwal Astha
Gupta Smita
Kalamkar Prathamesh
Karn Saurabh
Modi Ashutosh
Raghavan Vivek
Tiwari Aman
Publication venue
Publication date: 19/09/2022
Field of study

In populous countries, pending legal cases have been growing exponentially. There is a need for developing techniques for processing and organizing legal documents. In this paper, we introduce a new corpus for structuring legal documents. In particular, we introduce a corpus of legal judgment documents in English that are segmented into topical and coherent parts. Each of these parts is annotated with a label coming from a list of pre-defined Rhetorical Roles. We develop baseline models for automatically predicting rhetorical roles in a legal document based on the annotated corpus. Further, we show the application of rhetorical roles to improve performance on the tasks of summarization and legal judgment prediction. We release the corpus and baseline model code along with the paper.Comment: Accepted at LREC 2022, 10 Pages (8 page main paper + 2 page references

arXiv.org e-Print Archive

SemEval 2023 Task 6: LegalEval -- Understanding Legal Texts

Author: Guha Shouvik Kumar
Joshi Abhinav
Kalamkar Prathamesh
Karn Saurabh
Malhan Sachin
Modi Ashutosh
Raghavan Vivek
Tanikella Sai Kiran
Tiwari Aman
Publication venue
Publication date: 19/04/2023
Field of study

In populous countries, pending legal cases have been growing exponentially. There is a need for developing NLP-based techniques for processing and automatically understanding legal documents. To promote research in the area of Legal NLP we organized the shared task LegalEval - Understanding Legal Texts at SemEval 2023. LegalEval task has three sub-tasks: Task-A (Rhetorical Roles Labeling) is about automatically structuring legal documents into semantically coherent units, Task-B (Legal Named Entity Recognition) deals with identifying relevant entities in a legal document and Task-C (Court Judgement Prediction with Explanation) explores the possibility of automatically predicting the outcome of a legal case along with providing an explanation for the prediction. In total 26 teams (approx. 100 participants spread across the world) submitted systems paper. In each of the sub-tasks, the proposed systems outperformed the baselines; however, there is a lot of scope for improvement. This paper describes the tasks, and analyzes techniques proposed by various teams.Comment: 13 Pages (9 Pages + References), Accepted at SemEval 202

arXiv.org e-Print Archive